Kickstarter is an American public-benefit corporation based in Brooklyn, New York, that maintains a global crowd funding platform focused on creativity. The company’s stated mission is to “help bring creative projects to life”. For this assignment, we analyze the descriptions of kickstarter projects to identify commonalities of successful (and unsuccessful projects) using the text mining techniques.
State (state): Whether a campaign was successful or not. Pledged Amount (pledged) Achievement Ratio: Create a variable achievement_ratio by calculating the percentage of the original monetary goal reached by the actual amount pledged. Number of backers (backers_count) How quickly the goal was reached (difference between launched_at and state_changed_at) for those campaigns that were successful. Use one or more of these measures to visually summarize which categories were most successful in attracting funding on kickstarter. Briefly summarize your findings.
Based on the output above, we can infer that for the average success rate category, the difference between most project types is not very large. However, we do see that projects from categories such as Dance, Comics, Publishing, Music and Theater projects seem more likely to be a success.
On the other hand, we see a few clear strong performers when looking at achievement ratio and average number of backers. The top performing projects in terms of the average backers count seems to be Games, Design followed by Technology. When it comes to average achievement ratio, Design and Games are on the lead. It is interesting to see other patterns too. For instance, for Comics, the success rate is much higher when we use the average success rate than the average achievement ratio or the average backers count.
BONUS ONLY: b) Success by Location Now, use the location information to calculate the total number of successful projects by state (if you are ambitious, normalize by population). Also, identify the Top 50 “innovative” cities in the U.S. (by whatever measure you find plausible). Provide a leaflet map showing the most innovative states and cities in the U.S. on a single map based on these information.
## OGR data source with driver: GeoJSON
## Source: "/Users/samikshya/Desktop/dataviz/gz_2010_us_040_00_500k.json", layer: "gz_2010_us_040_00_500k"
## with 52 features
## It has 5 fields
I have first generated longitudes and latitudes for the different states and cities and then merged this data with the spatial data geojson file to be able to make the maps.
There are two maps embedded in the leaflet map generated from the list of top states and top cities. One can select the view based on city or state. The radius of the cities shows the amount of projects in that city and the depth of the color of states the amount of projects there i.e. darker blue means more number of projects happened in the state. States with no projects are shaded with gray.
Provide a word cloud of the most frequent or important words (your choice which frequency measure you choose) among the most successful projects.
For the wordcloud, I am using the term frequency scores from corpus created using the blurb from sucessful projects.
The second wordcloud is generated using the term frequency scores from corpus created using the blurb from failed projects. This is created just to see the difference between the two sets of wordclouds.
Provide a pyramid plot to show how the words between successful and unsuccessful projects differ in frequency. A selection of 10 - 20 top words is sufficient here.
## [1] 5.1 4.1 4.1 2.1
Based on the plot it seems that the projects with denser blubr i.e. textual content (based on the FK grade level) tend to be less succesful. Thus, here it seems that more complex language is not a virtue for higher achievement rate.
There is not much difference to the the achievement ratio as per sentiment. It may work both ways as positive words may be appealing to generate enthusiasm but negative words may appeal to compassion. And negative words may be used in blurbs generating funds to fight these issues such as- describing the background of social problems such as sexual violence or suicides to generate funds for projects that address these problems.
We can see that the achievement ratio falls as the number of words with positive emotions such as trust, positive increases whereas if the number of negative words in the blurb increase i.e. more negative emotions seems to have a positive relation towards the achievement rate. For instance- disgust and fear emotions are associated with higher acheivement rates for the projects.